STRING: finding tandem repeats in DNA sequences

نویسندگان

  • Valerio Parisi
  • Valeria De Fonzo
  • Filippo Aluffi-Pentini
چکیده

MOTIVATION AND RESULTS The importance of Tandem Repeats in some genomes is now well established. We have reported elsewhere some interesting new results obtained by means of a preliminary program for finding Tandem Repeats in DNA sequences, together with a brief description of the basic ideas of the algorithm. We describe here a completely new program based only in part on those ideas, we briefly discuss the interpretation of the results, and, by way of example, we provide a few novel results relative to the parasites responsible of two re-emerging diseases, Plasmodium falciparum and Mycobacterium tuberculosis. Our program is portable, effective, powerful and fast: it can run on current desktop computers, and it finds all significant Tandem Repeats also in the longest segments of sequences in databases (up to millions of bases), in short times (minutes). AVAILABILITY An academic version of the algorithm (full source listing in standard C language) can be freely downloaded (http://www.caspur.it/~castri/STRING/). SUPPLEMENTARY INFORMATION Some illustrative figures and some sample results are provided as supplementary material at: http://www.caspur.it/~castri/STRING/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

S Hort T Andem R Epeats D Etection in Dna S Equences Using

Identification of the short tandem repeats in DNA sequences is a challenging problem for the scientists and engineers in the current era. The detection of the short tandem repeats is also an important part of gene annotation and also it is useful to identify the various hereditary diseases and human identity, etc. The several methods have been developed to find the short tandem repeats, and the...

متن کامل

Linear time algorithms for finding and representing all the tandem repeats in a string

A tandem repeat (or square) is a string aa; where a is a non-empty string. We present an OðjSjÞ-time algorithm that operates on the suffix tree TðSÞ for a string S; finding and marking the endpoint in TðSÞ of every tandem repeat that occurs in S: This decorated suffix tree implicitly represents all occurrences of tandem repeats in S; and can be used to efficiently solve many questions concernin...

متن کامل

Finding Approximate Tandem Repeats with the Burrows-Wheeler Transform

Approximate tandem repeats in a genomic sequence are two or more contiguous, similar copies of a pattern of nucleotides. They are used in DNA mapping, studying molecular evolution mechanisms, forensic analysis and research in diagnosis of inherited diseases. All their functions are still investigated and not well defined, but increasing biological databases together with tools for identificatio...

متن کامل

Direct mapping of symbolic DNA sequence into frequency domain in global repeat map algorithm

The main feature of global repeat map (GRM) algorithm (www.hazu.hr/grm/software/win/grm2012.exe) is its ability to identify a broad variety of repeats of unbounded length that can be arbitrarily distant in sequences as large as human chromosomes. The efficacy is due to the use of complete set of a K-string ensemble which enables a new method of direct mapping of symbolic DNA sequence into frequ...

متن کامل

An efficient algorithm for finding short approximate non-tandem repeats

We study the problem of approximate non-tandem repeat extraction. Given a long subject string S of length N over a finite alphabet Sigma and a threshold D, we would like to find all short substrings of S of length P that repeat with at most D differences, i.e., insertions, deletions, and mismatches. We give a careful theoretical characterization of the set of seeds (i.e., some maximal exact rep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 19 14  شماره 

صفحات  -

تاریخ انتشار 2003